Picture for Fahim Tajwar

Fahim Tajwar

Maximum Likelihood Reinforcement Learning

Add code
Feb 02, 2026
Viaarxiv icon

Expanding the Capabilities of Reinforcement Learning via Text Feedback

Add code
Feb 02, 2026
Viaarxiv icon

Reasoning as an Adaptive Defense for Safety

Add code
Jul 01, 2025
Figure 1 for Reasoning as an Adaptive Defense for Safety
Figure 2 for Reasoning as an Adaptive Defense for Safety
Figure 3 for Reasoning as an Adaptive Defense for Safety
Figure 4 for Reasoning as an Adaptive Defense for Safety
Viaarxiv icon

Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation

Add code
Jun 09, 2025
Viaarxiv icon

Can Large Reasoning Models Self-Train?

Add code
May 27, 2025
Figure 1 for Can Large Reasoning Models Self-Train?
Figure 2 for Can Large Reasoning Models Self-Train?
Figure 3 for Can Large Reasoning Models Self-Train?
Figure 4 for Can Large Reasoning Models Self-Train?
Viaarxiv icon

Training a Generally Curious Agent

Add code
Feb 24, 2025
Figure 1 for Training a Generally Curious Agent
Figure 2 for Training a Generally Curious Agent
Figure 3 for Training a Generally Curious Agent
Figure 4 for Training a Generally Curious Agent
Viaarxiv icon

Self-Regulation and Requesting Interventions

Add code
Feb 07, 2025
Figure 1 for Self-Regulation and Requesting Interventions
Figure 2 for Self-Regulation and Requesting Interventions
Figure 3 for Self-Regulation and Requesting Interventions
Figure 4 for Self-Regulation and Requesting Interventions
Viaarxiv icon

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Add code
Apr 23, 2024
Figure 1 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 2 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 3 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 4 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Viaarxiv icon

Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias

Add code
Oct 12, 2023
Figure 1 for Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
Figure 2 for Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
Figure 3 for Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
Figure 4 for Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
Viaarxiv icon

Conservative Prediction via Data-Driven Confidence Minimization

Add code
Jun 08, 2023
Figure 1 for Conservative Prediction via Data-Driven Confidence Minimization
Figure 2 for Conservative Prediction via Data-Driven Confidence Minimization
Figure 3 for Conservative Prediction via Data-Driven Confidence Minimization
Figure 4 for Conservative Prediction via Data-Driven Confidence Minimization
Viaarxiv icon